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CROSS-REFERENCE TO RELATED APPLICATIONS 

This application is based upon and claims priority from European Patent 
Application No. 02425469.0, filed July 19, 2002, the entire disclosure of which is herein 
incorporated by reference. 

BACKGROUND OF THE INVENTION 
1 . Field of the Invention 

The present invention relates to digital systems, and more specifically to a 
pipeline structure for use in a digital system. 

2 -Description of JEtelated. Art 

A pipeline structure consists of a sequence of functional units (stages), which 
perform a task in several steps; the stages work in parallel thus giving higher throughput 
than if all the steps had to be completed before starting a next task. Pipelines are 
commonly used in several applications, for example, to process different parts of an 
instruction in a microprocessor. 

Typically, the pipeline has a synchronous architecture. A synchronous pipeline 
receives a single clock signal, which controls all the stages. As a consequence, every 
stage must complete its work within one clock period. 

A drawback of the synchronous pipeline is that all the stages switch at the same 
time. This involves high peaks of power consumption (due to the current dissipated by 
the short-circuits that are formed during the switching of the transistors of the logic gates, 
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and to the current needed for charging and discharging wires and capacitors). These 
peaks of power consumption introduce sources of noise, which can jeopardize the 
functionality of the whole electronic device that embeds the pipeline. Moreover, they 
impose several constrains in the design of a power supply structure; particularly, metal 
5 tracks used to supply the electronic device (when integrated in a chip of semiconductor 
material) must be dimensioned so as to withstand the aforementioned high peaks. As a 
consequence, an increased area of the chip is required to integrate the electronic device. 

Asynchronous pipelines have been proposed in order to reduce the peaks of power 
consumption. In an asynchronous pipeline, all the stages proceed independently (so that 
1 0 they do not switch at the same time). A handshaking mechanism is then used to maintain 
every pair of adjacent stages in synchronization. For this purpose, each stage generates a 
signal indicative of the completion of its work. This signal is used to move the result of 
the stage to a next stage, and then to trigger starting of the next stage. 

However, the implementation of the handshaking mechanism is relatively 
15 complex. Moreover, an additional circuit is required to synchronize the flux of input and 
output information with the external circuitry. 

SUMMARY OF THE INVENTION 

In view of these drawbacks, it is an object of the present invention to overcome 

20 the above-mentioned drawbacks and to provide an improved pipeline structure. 

Briefly, one embodiment of the present invention provides a pipeline structure for 
use in a digital system. The pipeline structure includes stages arranged in a sequence 
from a first stage for receiving an input of the pipeline structure to a last stage for 
providing an output of the pipeline structure. At least one intermediate stage is 

25 interposed between the first stage and the last stage. The pipeline structure also includes 
a phase shifting circuit for generating at least one local clock signal for controlling the at 
least one intermediate stage. The first stage and the last stage are controlled by a main 
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clock signal, the at least one local clock signal is generated from the main clock signal, 
and the main clock signal and the at least one local clock signal are out of phase. 

Moreover, embodiments of the present invention provide a digital system 
including such a pipeline structure, and an electronic device including such a digital 
5 system. 

A further embodiment of the present invention provides a method of operating a 
pipeline structure that includes stages arranged in a sequence. The sequence includes a 
first stage for receiving an input of the pipeline structure to a last stage for providing an 
output of the pipeline structure, with at least one intermediate stage being interposed 

10 between the first stage and the last stage. According to the method, the first stage and the 
last stage are controlled with a main clock signal, and at least one local clock signal is 
generated from the main clock signal. The main clock signal and the at least one local 
clock signal are out of phase, and the at least one intermediate stage is controlled with the 
at least one local clock signal. 

15 Other objects, features and advantages of the present invention will become 

apparent from the following detailed description. It should be understood, however, that 

the detailed descriptioiLand.specifie_exam 

the present invention, are given by way of illustration only and various modifications may 
naturally be performed without deviating from the present invention. 

20 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a block diagram of a hand-held computer in which the pipeline 
structure of the present invention can be used; 

Figure 2 illustrates the functional blocks of a pipeline structure according to a 
25 preferred embodiment of the present invention; and 

Figure 3 is a timing diagram showing operation of the pipeline structure of Figure 
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DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 

Preferred embodiments of the present invention will be described in detail 
hereinbelow with reference to the attached drawings. 

Figure 1 shows a hand-held computer 100. The hand-held computer 100 5 also 
5 known as palmtop, pocket computer or Personal Digital Assistants (PDA), is a very small 
system that literally fits in one hand. The hand-held computer 100 is formed by several 
units, which are connected in parallel to a communication bus 105. In detail, a 
microprocessor 110 controls operation of the hand-held computer 100, a DRAM 115 is 
directly used as a working memory by the microprocessor 110, and a Read Only Memory 
10 (ROM) 120 stores basic code for a bootstrap of the hand-held computer 100. 

Several peripheral units are further connected to the bus 105. Particularly, a non- 
volatile memory 125, typically consisting of a flash E 2 PROM, operates as a solid-state 
mass memory for the hand-held computer 100. Moreover, the hand-held computer 100 
includes input devices 130 (for example, an electronic pen or stylus), and output devices 
15 135 (for example, a flat panel screen made with a TFT technology). Interfaces 140 are 
used to connect external peripherals (such as a PCMCIA network card) to the hand-held 

— -computer-100 _ 

A timing unit 145 generates a main clock signal CLK m , which is used to 
synchronize operation of the hand-held computer 100. A battery pack 150 provides a 
20 power supply voltage Vdd for all the units of the hand-held computer 100, so as to enable 
the hand-held computer 100 to run without being plugged in. 

The microprocessor 110 has a pipeline architecture, wherein a sequence of stages 
simultaneously processes different parts of every instruction to be executed by the 
microprocessor 110. Particularly, a first stage fetches the instruction (from the DRAM 
25 1 15), a second stage decodes the instruction, a third stage fetches the arguments (if any), a 
fourth stage executes the operations required by the instruction, and a fifth stage stores a 
possible result. In this way, as one instruction is executed, the next instruction is being 
decoded and the one after that is being fetched. For maximum performance, the pipeline 
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requires a continuous stream of instructions; therefore, this technique is commonly 
combined with instruction prefetch in an attempt to keep the pipeline busy. 

Similar considerations apply if the hand-held computer has a different structure or 
includes other units (for example, an infrared port), if the pipeline is formed by a different 
5 number of stages, if no prefetch is implemented, if each stage performs other functions, 
and the like. Alternatively, the pipeline is used in the microprocessor of a laptop 
computer, in a mobile telephone, in a memory (wherein data is saved in a stack while the 
next data is being accessed), or more generally in any other digital system. 

Figure 2 shows a pipeline structure according to a preferred embodiment of the 

10 present invention for use in the microprocessor of the hand-held computer. The pipeline 
structure 200 is formed by N=5 stages STj (with i=l...N). Each stage STi includes a 
register Ri and a combinatorial circuit Q (except for the last stage ST5, which only has the 
register R 5 without any combinatorial circuit). The combinatorial circuit Q is cascade 
connected to the corresponding register Rj; the register R\ (of the first stage STi) and the 

15 register R5 (of the last stage ST5) define an input and an output, respectively, of the 
pipeline 200. 

An input-word JN-(for example,_of 32J>its). xec&iYedJ^y^th^jgi^^r^_200 is stored. 

in the register Ri (as a word INi). Each register Ri (with the exception of the last one) 
operates as an input buffer for the corresponding combinatorial circuit Q. The 

20 combinatorial circuit Q processes a word INi provided by the register Ri, and generates a 
result consisting of a word OUTj; the combinatorial circuit Q has a propagation time Pi 
(defined as the delay for obtaining the word OUTi from the word INi). The output of the 
combinatorial circuit Q is then stored in the next register Rj+i (so that INj+i=OUTi). The 
word stored in the last register R$ (OUT 4 ) is output as the output word OUT of the 

25 pipeline 200. 

Operation of the pipeline 200 is controlled by the main clock signal CLK m . 
Particularly, each register Rj has a control terminal, which is used to trigger the loading of 
the word supplied at its input (word IN for the register Ri and word ESfj for the other 

Docket No. 02-CT-099/DP 5 



EXPRESS MAIL NO, EV343427580US 



registers R2-R5). The first register Ri and the last register R5 are directly controlled by the 
main clock signal CLK m . The other registers R2-R4 (of the intermediate stages ST 2 -ST 4 ) 
are controlled by local clock signals CLK 2 -CLK4, respectively. The local clock signals 
CLK2-CLK4 are generated from the main clock signal CLK m using a phase shifting 
5 circuit. This circuit has a delay block Di for each intermediate stage STj. The block Di 
generates the corresponding local clock signal CLKj by applying a pre-set delay di to the 
clock signal controlling the next stage STj+i; in other words, the local clock signals CLK2, 
CLK 3 and CLK4 are generated by delaying the clock signals CLK 3 , CLK4 and CLK m , 
respectively. The delay blocks D2-D4 ensure that the main clock signal CLK m and every 
10 local clock signal CLKi are out of phase, so that all of the registers R1-R5 never switch at 
the same time. 

Similar considerations apply if the pipeline includes a different number of stages 
(down to three), if the word consists of a different number of bits, if the registers are 
replaced with equivalent buffers, if a further combinatorial circuit is connected to the last 

15 register, if the first register is missing, and so on. 

Operation of the pipeline structure described above is shown in the simplified 

timing diagram of _Figure_3._ _The. Yarious__signaLs_are .switched ^tjhe rising ^dge^ of the 

respective clock signal (CLK m , CLK2-CLK4); each word is represented by a band (the 
crossing points of the band define the switching times). The input word IN is loaded into 

20 the first register Ri (word INi) at the time T\ (in response to the rising edge of the main 
clock signal CLK m ). The word INi is processed by the combinatorial circuit Ci; the 
output of the combinatorial circuit Ci (word OUTi) is stored in the second register R2 
(word IN2) at the next rising edge of the local clock signal CLK2 (time Ti+d4+d3+d 2 ). ^ n 
a similar manner, the output of the combinatorial circuit C2 (word OUT2) is stored in the 

25 third register R 3 (word IN3) at the next rising edge of the local clock signal CLK3 (time 
T2+cU+d3). The output of the combinatorial circuit C3 (word OUT3) is likewise stored in 
the fourth register R4 (word IN4) at the next rising edge of the local clock signal CLK4 
(time T 3 +d4). The word IN4 is then processed by the combinatorial circuit C4; the output 
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of the combinatorial circuit C4 (word OUT 4 ) is stored in the last register R 5 (providing the 
output word OUT) at the next rising edge of the main clock signal CLK m (time T 4 ). 
Therefore, three clock periods (T1-T4) are needed to pass through the entire pipeline (in 
order to get the output word OUT corresponding to the input word IN). 
5 Correct operation of the pipeline requires that a new word cannot be written into a 

register before the previous one has been used (by the corresponding combinatorial 
circuit). Particularly, a generic word INj is supplied to the combinatorial circuit C\ as 
soon as it is loaded into the corresponding register Rj. The combinatorial circuit Q 
generates the resulting word OUTj after the respective propagation time Pj. In order to 
10 ensure that the combinatorial circuit Q has completed its work before the word OUTi is 
stored in the next register Ri+i, the difference between the switching times of the registers 
Rj+i and Rj must be greater than the propagation time Pj of the combinatorial circuit Q. 

Considering in particular the first stage STi, the register R\ switches at every 
rising edge of the main clock signal CLK m (for example, Ti); the second register R 2 

N-\ 

1 5 switches at the time T x + d A + d 3 + d 2 = T x + ^ dj • Therefore, the following relation 
must.be met: 

7=2 



N-l 

20 Denoting with T m the time of a generic rising edge of the main clock signal CLK m , a 

N-l 

register Rj of any intermediate stage (from ST 2 to ST 4 ) switches at the time T m t ; 

J=i 

N-l N-l 

the next register R i+ i switches at the time T m+l + + r+ (where T is the 

./=i+l j=i+l 
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period of the main clock signal CLK m ). Therefore, the restraint applicable to every 
intermediate stage is: 

T m +r+ ! z<ij-<r.+ ! L<tj)z p t 

T-d,*P, 

Finally, the register R4 switches at the time T3+CI4 and the register R5 switches at the time 
T 4 =T34-T, so that the following condition must be met for the last stage: 

T 3 +T-(T 3 + d 4 )>P 4 
T- d 4 >P 4 

Similar considerations apply if a different timing is envisaged for the pipeline, if 
the signals are strobed after two or more clock periods from their switching, if the 
difference between the switching times of the adjacent registers is greater than the clock 
period, and so on. 

More generally, the present invention proposes a pipeline structure for use in a 
digital system. The pipeline structure includes a plurality of stages arranged in a 
sequence from a first stage (for receiving an input of the pipeline structure) to a last stage 
- (for providing an output of the pipeline structure); one-or-more-inteimediate stages-are 
interposed between the first stage and the last stage. The first stage and the last stage are 
controlled by a main clock signal. In the pipeline structure of preferred embodiments of 
the present invention, phase shifting means or circuitry is provided for generating one or 
more local clock signals (from the main clock signal) for controlling the intermediate 
stages; the main clock signal and the local clock signals are out of phase. 

The proposed solution greatly reduces the peaks of power consumption in the 
pipeline structure. In this way, less sources of noise are introduced. Moreover, the 
constraints in the design of a power supply structure for the whole electronic device that 
embeds the pipeline are relaxed; particularly, metal tracks used to supply the electronic 
device (when integrated in a chip of semiconductor material) may be smaller. As a 
consequence, a reduced area of the chip is required to integrate the electronic device. 
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This result is achieved with a very simple architecture, without any handshaking 
mechanism being required between the stages of the pipeline. 

In addition, the pipeline structure of preferred embodiments of the present 
invention maintains a synchronous interface with external circuitry (for the flux of input 
5 and output information). Further, the proposed solution makes it possible to reduce the 
number of clock periods required to pass through the entire pipeline (compared with the 
conventional synchronous pipeline), even if different timings are not excluded. 

The preferred embodiment of the present invention described above offers further 
advantages. For example, the preferred pipeline structure has multiple intermediate 
10 stages, each one of which is controlled by a corresponding local clock signal (with all the 
local clock signals being out of phase). This feature further reduces the peaks of power 
consumption (since all the intermediate stages switch at different times). 

Preferably, each local clock signal is obtained by delaying the clock signal 
controlling an adjacent stage. 
1 5 The proposed structure is very simple, but at the same time effective. 

As a further enhancement, each delay block preferably receives as input the clock 
_ _ _signal_of_ the next stage. This solution makes, it possible to ensure c(OT_e^t^pemtion _of the 
pipeline with shorter delays (than if the local clock signals were obtained from the 
previous stage). 

20 Alternatively, the local clock signals are not all out of phase, two or more stages 

are controlled by the same local clock signal, the pipeline includes a single intermediate 
stage, each local clock signal is obtained by delaying another clock signal (for example, 
the one controlling the previous stage), or different phase shifting means or circuitry are 
envisaged. 

25 Preferably, each intermediate stage includes a functional unit and a buffer; the 

functional unit has a propagation time lower than the phase difference between the 
corresponding clock signal and the clock signal controlling the next stage. This structure 
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better exploits the advantageous effects of the present invention (at the same time 
ensuring correct operation of the pipeline). 

Preferably, each stage consists of a combinatorial circuit and a corresponding 
buffer (storing a word). In this way, the peaks of power consumption are reduced to the 
5 minimum. 

However, the solution according to the present invention also leads itself to be 
implemented in a pipeline wherein each register consists of a stack with a depth of two or 
more words, or even in a pipeline having a different architecture (for example, consisting 
of a simple shift register without any combinatorial circuit). 
10 Typically, the pipeline structure of the present invention is used in a digital 

system. The improvement provided by the synchronous interface of the proposed 
pipeline structure is clearly perceived in a digital system of the synchronous type. 

Moreover, the solution according to the present invention is particularly 
advantageous in an electronic device that is supplied by a battery (wherein the power 
15 consumption is a very critical issue). 

However, the pipeline of the present invention is also suitable for use in different 

digital systems (even of the asynchronous-type), _and.in._any other_electronjc_ device if or 

example, supplied by an electric main). 

While there has been illustrated and described what are presently considered to be 
20 the preferred embodiments of the present invention, it will be understood by those skilled 
in the art that various other modifications may be made, and equivalents may be 
substituted, without departing from the true scope of the present invention. Additionally, 
many modifications may be made to adapt a particular situation to the teachings of the 
present invention without departing from the central inventive concept described herein. 
25 Furthermore, an embodiment of the present invention may not include all of the features 
described above. Therefore, it is intended that the present invention not be limited to the 
particular embodiments disclosed, but that the invention include all embodiments falling 
within the scope of the appended claims. 
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